extract(hpc): scitex.hpc → scitex-hpc v0.1.0 (generic SLURM dispatch)#258
Open
ywatanabe1989 wants to merge 10 commits intodevelopfrom
Open
extract(hpc): scitex.hpc → scitex-hpc v0.1.0 (generic SLURM dispatch)#258ywatanabe1989 wants to merge 10 commits intodevelopfrom
ywatanabe1989 wants to merge 10 commits intodevelopfrom
Conversation
scitex_dev.test_runner's HPC dispatch (run_hpc_srun/sbatch/sync/poll/fetch) was generic SLURM code that didn't belong in dev tooling. Extracted to standalone scitex-hpc v0.1.0 package: https://github.com/ywatanabe1989/scitex-hpc Public API in scitex.hpc: - JobConfig (dataclass with SCITEX_HPC_* env-var override resolution) - srun (blocking interactive) - sbatch (async, returns job ID) - sync (rsync local → host) - poll_job (sacct status) - fetch_result (scp .out file back) Login nodes never run compute — every command goes through bash -lc to load SLURM modules, then srun/sbatch. scitex.hpc is scitex_hpc: True (verified) scitex-hpc tests: 12/12 pass
… trio §6c — value-precedence cascade (direct → yaml → env → default) via scitex_config.PriorityConfig. CLI flags always win; do not hand-roll. §9 — observation/dry-run/execute pattern for mutating commands: - Mode flags: default observation, --dry-run preview, --<verb>-<scope> execute - Flag-naming: name action by scope (--update-hosts not --apply) - --reference names source-of-truth for state-converging ops - Filter flags use plural scope nouns (--hosts, --packages) - Dry-run is enforced via manifest gate (canonical: scitex-dev rename-symbols) Audit checklist updated with the new requirements.
… order Adds 'Resolution precedence' section listing the cascade (direct → yaml → env → default) and pointing to the canonical scitex_config.PriorityConfig implementation. Cross-references 03_interface_02_cli.md §6c.
Augments 06_skills_04_editable-installation.md with a concrete pre-tag verification step that catches the silent setuptools failure mode where SKILL.md is on git but absent from the wheel. Real instance: scitex-hpc 0.6.1 (2026-04-28) — package-data entry was missing, build succeeded, CI green, but the wheel didn't contain the SKILL.md. Caught at the unzip-l step before tagging; shipped 0.6.2 the same day with the fix. Adds: why setuptools' packages.find doesn't auto-include markdown data, the 5-second 'unzip -l' pre-tag check with expected output shape, and the post-install belt-and-suspenders verification. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ntries
Two general/* leaves existed on disk but weren't linked from SKILL.md
(skill-discovery agents couldn't find them):
- 04_docs_03_rtd.md (Read the Docs onboarding)
- 99_quality_03_packaging-bar.md (packaging quality bar)
98_quality_01_failure-playbook.md gains three entries from concrete
incidents during the 2026-04-27/28 multi-tenant scitex-hpc + sac
rollout, all documented with symptom / root cause / fix / where-found:
§8 a2a-sdk + protobuf 6.x — FieldDescriptor.label AttributeError
(caught CI red on sac develop; fix: protobuf<6, not <7)
§9 SLURM cgroup kills tmux spawned by srun --overlap
(caught Phase 4 prototype; fix: tmux as PID 1 of sbatch script)
§10 Chatty login-shell banners break SLURM-output parsing
(caught book() polling forever on Spartan; fix: parse line-by-line
against a known SLURM-state vocabulary)
Future agents hitting any of these symptoms now have the playbook entry
to point them straight at the fix.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- 05_version-control_02_release-automation.md: replace deprecated sync-remote / fix-mismatches with the unified ecosystem packages command (--hosts, --packages, --dry-run, --update-hosts) - 99_quality_03_packaging-bar.md: add §5a wheel-vs-source data-file audit (scitex_dev.audit_package_data) catching SKILL.md and other data-file drift before publish
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Extract the SLURM dispatch logic out of
scitex_dev.test_runnerinto a standalone scitex-hpc v0.1.0 package, bridged viasys.modulesalias.API
JobConfigdataclass withSCITEX_HPC_*env-var override resolutionbash -lcto load SLURM modules, thensrun/sbatchTest plan
python -c "import scitex.hpc as h; import scitex_hpc as r; assert h is r"— passessrun --partition=cascade hostnamedispatched viabash -lc 'srun ...'SSH, ran onspartan-bm022(verified before extraction)Why
The functions were generic SLURM dispatch — not specific to dev tooling. They should be reusable from any consumer, not just
scitex_dev. The umbrella re-export meansfrom scitex.hpc import srunworks in any package.